{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Quantize Model with Intel Neural Compressor\n",
    "### Prepare Environment\n",
    "Before you start with APIs delivered by bigdl-nano, you have to make sure BigDL-Nano is correctly installed for PyTorch. If not, please follow [this](../../../../../docs/readthedocs/source/doc/Nano/Overview/nano.md) to set up your environment.<br><br>\n",
    "By default, Intel Neural Compressor is not installed with BigDL-Nano. So if you determine to use it as your quantization backend, you'll need to install it first:\n",
    "```shell\n",
    "pip install neural-compressor==1.11.0\n",
    "```\n",
    "It's also required to install onnxruntime-extensions as a dependency of INC when using ONNXRuntime as backend as well as the dependencies of onnxruntime\n",
    "```bash\n",
    "pip install onnx onnxruntime\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load Data\n",
    "We used the [Oxford-IIIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/) for demo, which contains 37 categories with roughly 200 images for each classes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/envs/testVscode/lib/python3.7/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "import torch\n",
    "from torchvision.io import read_image\n",
    "from torchvision import transforms\n",
    "from torchvision.datasets import OxfordIIITPet\n",
    "from torch.utils.data.dataloader import DataLoader\n",
    "\n",
    "train_transform = transforms.Compose([transforms.Resize(256),\n",
    "                                      transforms.RandomCrop(224),\n",
    "                                      transforms.RandomHorizontalFlip(),\n",
    "                                      transforms.ColorJitter(brightness=.5, hue=.3),\n",
    "                                      transforms.ToTensor(),\n",
    "                                      transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])\n",
    "val_transform = transforms.Compose([transforms.Resize([224, 224]), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])\n",
    "# Apply data augmentation to the tarin_dataset\n",
    "train_dataset = OxfordIIITPet(root = \"/tmp/data\", transform=train_transform, download=True)\n",
    "val_dataset = OxfordIIITPet(root=\"/tmp/data\", transform=val_transform)\n",
    "\n",
    "# obtain training indices that will be used for validation\n",
    "indices = torch.randperm(len(train_dataset))\n",
    "val_size = len(train_dataset) // 4\n",
    "train_dataset = torch.utils.data.Subset(train_dataset, indices[:-val_size])\n",
    "val_dataset = torch.utils.data.Subset(val_dataset, indices[-val_size:])\n",
    "\n",
    "# prepare data loaders\n",
    "train_dataloader = DataLoader(train_dataset, batch_size=32)\n",
    "\n",
    "DEV_RUN = bool(os.environ.get('DEV_RUN', False))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Custom Model\n",
    "Regarding the model, we used pretrained torchvision.models.resnet18. More details, please refer to [here](https://pytorch.org/vision/0.12/generated/torchvision.models.resnet18.html?highlight=resnet18)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "GPU available: False, used: False\n",
      "TPU available: False, using: 0 TPU cores\n",
      "IPU available: False, using: 0 IPUs\n",
      "/opt/conda/envs/testVscode/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py:532: LightningDeprecationWarning: `trainer.fit(train_dataloader)` is deprecated in v1.4 and will be removed in v1.6. Use `trainer.fit(train_dataloaders)` instead. HINT: added 's'\n",
      "  \"`trainer.fit(train_dataloader)` is deprecated in v1.4 and will be removed in v1.6.\"\n",
      "/opt/conda/envs/testVscode/lib/python3.7/site-packages/pytorch_lightning/trainer/configuration_validator.py:101: UserWarning: you defined a validation_step but have no val_dataloader. Skipping val loop\n",
      "  rank_zero_warn(f\"you defined a {step_name} but have no {loader_name}. Skipping {stage} loop\")\n",
      "\n",
      "  | Name  | Type             | Params\n",
      "-------------------------------------------\n",
      "0 | model | ResNet           | 11.2 M\n",
      "1 | loss  | CrossEntropyLoss | 0     \n",
      "-------------------------------------------\n",
      "11.2 M    Trainable params\n",
      "0         Non-trainable params\n",
      "11.2 M    Total params\n",
      "44.782    Total estimated model params size (MB)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                                           "
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/envs/testVscode/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py:106: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 96 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.\n",
      "  f\"The dataloader, {name}, does not have many workers which may be a bottleneck.\"\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 4: 100%|██████████| 87/87 [00:42<00:00,  2.08it/s, loss=0.308, v_num=19]  \n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "tensor([29, 18])"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import torch\n",
    "from torchvision.models import resnet18\n",
    "from bigdl.nano.pytorch import Trainer\n",
    "from torchmetrics.classification import MulticlassAccuracy\n",
    "model_ft = resnet18(pretrained=True)\n",
    "num_ftrs = model_ft.fc.in_features\n",
    "\n",
    "# Here the size of each output sample is set to 37.\n",
    "model_ft.fc = torch.nn.Linear(num_ftrs, 37)\n",
    "loss_ft = torch.nn.CrossEntropyLoss()\n",
    "optimizer_ft = torch.optim.SGD(model_ft.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)\n",
    "\n",
    "# Compile our model with loss function, optimizer.\n",
    "model = Trainer.compile(model_ft, loss_ft, optimizer_ft, metrics=[MulticlassAccuracy(num_classes=37)])\n",
    "trainer = Trainer(max_epochs=5,\n",
    "                  fast_dev_run=DEV_RUN) #Run model quickly in test\n",
    "trainer.fit(model, train_dataloaders=train_dataloader)\n",
    "\n",
    "# Inference/Prediction\n",
    "x = torch.stack([val_dataset[0][0], val_dataset[1][0]])\n",
    "model_ft.eval()\n",
    "y_hat = model_ft(x)\n",
    "y_hat.argmax(dim=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Quantization without extra accelerator\n",
    "To use INC as your quantization engine, you can choose accelerator as None or 'onnxruntime'.<br>\n",
    "Without extra accelerator, `InferenceOptimizer.quantize()` returns a pytorch module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2022-07-01 09:14:43 [INFO] Generate a fake evaluation function.\n",
      "2022-07-01 09:14:43 [INFO] Pass query framework capability elapsed time: 183.81 ms\n",
      "2022-07-01 09:14:43 [INFO] Get FP32 model baseline.\n",
      "2022-07-01 09:14:43 [INFO] Save tuning history to /home/projects/BigDL/python/nano/notebooks/pytorch/tutorial/nc_workspace/2022-07-01_09-14-43/./history.snapshot.\n",
      "2022-07-01 09:14:43 [INFO] FP32 baseline is: [Accuracy: 1.0000, Duration (seconds): 0.0000]\n",
      "2022-07-01 09:14:44 [WARNING] Please note that calibration sampling size 100 isn't divisible exactly by batch size 32. So the real sampling size is 128.\n",
      "/opt/conda/envs/testVscode/lib/python3.7/site-packages/torch/nn/quantized/_reference/modules/conv.py:49: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  torch.tensor(weight_qparams[\"scale\"], dtype=torch.float, device=device))\n",
      "/opt/conda/envs/testVscode/lib/python3.7/site-packages/torch/nn/quantized/_reference/modules/conv.py:52: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  torch.tensor(weight_qparams[\"zero_point\"], dtype=torch.int, device=device))\n",
      "/opt/conda/envs/testVscode/lib/python3.7/site-packages/torch/nn/quantized/_reference/modules/linear.py:41: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  torch.tensor(weight_qparams[\"scale\"], dtype=torch.float, device=device))\n",
      "/opt/conda/envs/testVscode/lib/python3.7/site-packages/torch/nn/quantized/_reference/modules/linear.py:46: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  dtype=torch.int, device=device))\n",
      "2022-07-01 09:14:46 [INFO] |********Mixed Precision Statistics*******|\n",
      "2022-07-01 09:14:46 [INFO] +------------------------+--------+-------+\n",
      "2022-07-01 09:14:46 [INFO] |        Op Type         | Total  |  INT8 |\n",
      "2022-07-01 09:14:46 [INFO] +------------------------+--------+-------+\n",
      "2022-07-01 09:14:46 [INFO] |  quantize_per_tensor   |   1    |   1   |\n",
      "2022-07-01 09:14:46 [INFO] |       ConvReLU2d       |   9    |   9   |\n",
      "2022-07-01 09:14:46 [INFO] |       MaxPool2d        |   1    |   1   |\n",
      "2022-07-01 09:14:46 [INFO] |         Conv2d         |   11   |   11  |\n",
      "2022-07-01 09:14:46 [INFO] |        add_relu        |   8    |   8   |\n",
      "2022-07-01 09:14:46 [INFO] |   AdaptiveAvgPool2d    |   1    |   1   |\n",
      "2022-07-01 09:14:46 [INFO] |        flatten         |   1    |   1   |\n",
      "2022-07-01 09:14:46 [INFO] |         Linear         |   1    |   1   |\n",
      "2022-07-01 09:14:46 [INFO] |       dequantize       |   1    |   1   |\n",
      "2022-07-01 09:14:46 [INFO] +------------------------+--------+-------+\n",
      "2022-07-01 09:14:46 [INFO] Pass quantize model elapsed time: 2347.22 ms\n",
      "2022-07-01 09:14:46 [INFO] Tune 1 result is: [Accuracy (int8|fp32): 1.0000|1.0000, Duration (seconds) (int8|fp32): 0.0000|0.0000], Best tune result is: [Accuracy: 1.0000, Duration (seconds): 0.0000]\n",
      "2022-07-01 09:14:46 [INFO] |**********************Tune Result Statistics**********************|\n",
      "2022-07-01 09:14:46 [INFO] +--------------------+----------+---------------+------------------+\n",
      "2022-07-01 09:14:46 [INFO] |     Info Type      | Baseline | Tune 1 result | Best tune result |\n",
      "2022-07-01 09:14:46 [INFO] +--------------------+----------+---------------+------------------+\n",
      "2022-07-01 09:14:46 [INFO] |      Accuracy      | 1.0000   |    1.0000     |     1.0000       |\n",
      "2022-07-01 09:14:46 [INFO] | Duration (seconds) | 0.0000   |    0.0000     |     0.0000       |\n",
      "2022-07-01 09:14:46 [INFO] +--------------------+----------+---------------+------------------+\n",
      "2022-07-01 09:14:46 [INFO] Save tuning history to /home/projects/BigDL/python/nano/notebooks/pytorch/tutorial/nc_workspace/2022-07-01_09-14-43/./history.snapshot.\n",
      "2022-07-01 09:14:46 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.\n",
      "2022-07-01 09:14:46 [INFO] Save deploy yaml to /home/projects/BigDL/python/nano/notebooks/pytorch/tutorial/nc_workspace/2022-07-01_09-14-43/deploy.yaml\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "tensor([29, 18])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from bigdl.nano.pytorch import InferenceOptimizer\n",
    "q_model = InferenceOptimizer.quantize(model, calib_data=train_dataloader)\n",
    "y_hat = q_model(x)\n",
    "y_hat.argmax(dim=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Quantization with ONNXRuntime accelerator\n",
    "With the ONNXRuntime accelerator, `InferenceOptimizer.quantize()` will return a model with compressed precision but running inference in the ONNXRuntime engine."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2022-07-01 09:14:48 [INFO] Generate a fake evaluation function.\n",
      "2022-07-01 09:14:48 [INFO] Get FP32 model baseline.\n",
      "2022-07-01 09:14:48 [INFO] Save tuning history to /home/projects/BigDL/python/nano/notebooks/pytorch/tutorial/nc_workspace/2022-07-01_09-14-43/./history.snapshot.\n",
      "2022-07-01 09:14:48 [INFO] FP32 baseline is: [Accuracy: 1.0000, Duration (seconds): 0.0000]\n",
      "2022-07-01 09:14:48 [WARNING] Please note that calibration sampling size 100 isn't divisible exactly by batch size 32. So the real sampling size is 128.\n",
      "tcmalloc: large alloc 1073741824 bytes == 0x557ba31c2000 @  0x7f24da9a1d3f 0x7f24da9d80c0 0x7f24da9db082 0x7f24da9db243 0x7f243446d16c 0x7f243463b8d4 0x7f24344851df 0x7f24344cf3c6 0x7f24344c79e4 0x7f24340e9cce 0x7f24340ea4e2 0x7f24340973d4 0x7f24340636e2 0x557b1f4b7e74 0x557b1f516507 0x557b1f4ce591 0x557b1f4e56d5 0x557b1f4836ad 0x557b1f4b1af1 0x557b1f4ce3a5 0x557b1f4e211a 0x557b1f483e03 0x557b1f4b1a40 0x557b1f4ce3a5 0x557b1f4e211a 0x557b1f4836ad 0x557b1f4b1af1 0x557b1f4ce3a5 0x557b1f4e211a 0x557b1f483d04 0x557b1f4b1a40\n",
      "2022-07-01 09:14:54 [INFO] |*******Mixed Precision Statistics******|\n",
      "2022-07-01 09:14:54 [INFO] +----------------------+--------+-------+\n",
      "2022-07-01 09:14:54 [INFO] |       Op Type        | Total  |  INT8 |\n",
      "2022-07-01 09:14:54 [INFO] +----------------------+--------+-------+\n",
      "2022-07-01 09:14:54 [INFO] |         Conv         |   20   |   20  |\n",
      "2022-07-01 09:14:54 [INFO] |        MatMul        |   1    |   1   |\n",
      "2022-07-01 09:14:54 [INFO] |       MaxPool        |   1    |   1   |\n",
      "2022-07-01 09:14:54 [INFO] |  GlobalAveragePool   |   1    |   1   |\n",
      "2022-07-01 09:14:54 [INFO] |         Add          |   9    |   9   |\n",
      "2022-07-01 09:14:54 [INFO] |    QuantizeLinear    |   3    |   3   |\n",
      "2022-07-01 09:14:54 [INFO] |   DequantizeLinear   |   3    |   3   |\n",
      "2022-07-01 09:14:54 [INFO] +----------------------+--------+-------+\n",
      "2022-07-01 09:14:54 [INFO] Pass quantize model elapsed time: 5679.3 ms\n",
      "2022-07-01 09:14:54 [INFO] Tune 1 result is: [Accuracy (int8|fp32): 1.0000|1.0000, Duration (seconds) (int8|fp32): 0.0000|0.0000], Best tune result is: [Accuracy: 1.0000, Duration (seconds): 0.0000]\n",
      "2022-07-01 09:14:54 [INFO] |**********************Tune Result Statistics**********************|\n",
      "2022-07-01 09:14:54 [INFO] +--------------------+----------+---------------+------------------+\n",
      "2022-07-01 09:14:54 [INFO] |     Info Type      | Baseline | Tune 1 result | Best tune result |\n",
      "2022-07-01 09:14:54 [INFO] +--------------------+----------+---------------+------------------+\n",
      "2022-07-01 09:14:54 [INFO] |      Accuracy      | 1.0000   |    1.0000     |     1.0000       |\n",
      "2022-07-01 09:14:54 [INFO] | Duration (seconds) | 0.0000   |    0.0000     |     0.0000       |\n",
      "2022-07-01 09:14:54 [INFO] +--------------------+----------+---------------+------------------+\n",
      "2022-07-01 09:14:54 [INFO] Save tuning history to /home/projects/BigDL/python/nano/notebooks/pytorch/tutorial/nc_workspace/2022-07-01_09-14-43/./history.snapshot.\n",
      "2022-07-01 09:14:54 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.\n",
      "2022-07-01 09:14:54 [INFO] Save deploy yaml to /home/projects/BigDL/python/nano/notebooks/pytorch/tutorial/nc_workspace/2022-07-01_09-14-43/deploy.yaml\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "tensor([29, 18])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ort_q_model = InferenceOptimizer.quantize(model, accelerator='onnxruntime', calib_data=train_dataloader)\n",
    "y_hat = ort_q_model(x)\n",
    "y_hat.argmax(dim=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7.12 ('nano')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "75c4387adfc215da0f2d9d02c27ad9a4df553a9f0187eec0365fe565a2e50216"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
